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Semiannual Progress Report 

Covers period 1 May 1994 through 30 November 1994 


As outlined in our continuation proposal 92-ISI-50R (revised) on NASA cooperative agreement 
NCC 2-539, we are (1) developing software, including a system manager and a job manager, that 
will manage available resources and that will enable programmers to develop and execute parallel 
applications in terms of a virtual configuration of processors, hiding the mapping to physical nodes; 
(2) developing communications routines that support the abstractions implemented in item one; (3) 
continuing the development of file and information systems based on the Virtual System Model; 
and (4) incorporating appropriate security measures to allow the mechanisms developed in items 
1 through 3 to be used on an open network. 

The goal throughout our work is to provide a uniform model that can be applied to both parallel 
and distributed systems. We believe that multiprocessor systems should exist in the context of 
distributed systems, allowing them to be more easily shared by those that need them. Our work 
provides the mechanisms through which nodes on multiprocessors are allocated to jobs running 
within the distributed system and the mechanisms through which files needed by those jobs can be 
located and accessed. 

The Prospero Resource Manager 

Conventional techniques for managing resources in parallel systems perform poorly in large dis- 
tributed systems. To manage resources in distributed parallel systems, we have developed resource 
management tools that manage resources at two levels; allocating system resources to jobs as needed 
(a job is a collection of tasks working together), and separately managing the resources assigned 
to each job. The Prospero Resource Manager (PRM) presents a uniform and scalable model for 
scheduling tasks in parallel and distributed systems. PRM provides the mechanisms through which 
nodes on multiprocessors can be allocated to jobs running within an extremely large distributed 
system. 

The common approach of using a single resource manager to manage all resources in a large system 
is not practical. As the system grows, a single resource manager becomes a bottleneck. Even within 
large local multiprocessor systems the number of resources to be managed can adversely affect 
performance. As a distributed system scales geographically and administratively, additional prob- 
lems arise, such as latency, trust and security. 

PRM addresses the bottleneck problems by using multiple resource managers, each controlling a 
subset of the resources in the system, independent of other managers of the same type. The functions 
of resource management are distributed across three types of managers: system managers, job 
managers, and node managers. The complexity of these management roles is reduced because each 
is designed to utilize information at an appropriate level of abstraction. 
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During the reporting period, we extended PRM's resource allocation and release policies. Depending 
on its configuration, the system manager may allocate a node to a job for the entire duration of the 
job's execution, or for executing a designated set of tasks. The former policy is more efficient for 
jobs in which tasks are dynamically spawned. The latter policy enables the system manager to 
preempt nodes from a job, and force its tasks to checkpoint (for tasks capable of checkpointing). It 
is then the job manager's responsibility to find an alternate set of nodes to migrate its tasks to. The 
implementation allows for a default policy to be configured to suit the users’ requirements. 

To debug PRM applications we have developed debugging tools consisting of a command interface 
at the front-end and task-monitors at the back-end. The interactive front-end enables the application 
programmer to monitor and control program execution. At the back-end each target task is controlled 
by a separate task monitor that is co-located with the target. We have adapted the Gnu Debugger 
(gdb) to function as the task monitor and interact with the front end. Use of gdb gives the task 
monitor all the features of a traditional debugger. 

We have also added support for playback debugging using traces. When applications are linked 
with an instrumented version of the communication library the communication activity of the 
program can be captured in trace files. Invoking ’replay’ at the command interface causes task- 
monitors to use these trace files to replay programs and exactly recreate the sequence of events in 
a program’s history. Work is underway to incorporate checkpointing with debugging so that pro- 
grams can be replayed from intermediate points in their execution histories. 

We have updated PRM’s libraries to fully support the interprocess communication interface provided 
by the current release of PVM (Version 3.3.5). A group-server process now handles collective 
operations such as broadcast, barrier synchronization and global reduction. The group server is 
automatically spawned by the job manager when a group operation is first invoked by one of the 
tasks. 

We have continued the development of PRM software and documentation and are preparing it for 
release to users outside ISI. This release will include portions of the Condor package that PRM uses 
for checkpointing, patches that enable gdb to function as a task-monitor, a configuration script that 
enables users to easily configure and build PRM on Sun3, Sparc and HP7xx platforms with the 
desired options (such as checkpointing and playback debugging), and a start-up script for setting 
up PRM environments. 

After further refinement, a final software release will terminate our work on the Prospero Resource 
Manager under the DIVIRS project. We will continue to support the software release and further 
the development of PRM under new projects. 
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The Prospero File System and Directory Service 

During the reporting period, we continued development of the Prospero File System and Directory 
Service, a file system and directory service based on the Virtual System Model. As in the previous 
reporting period, most of our development was directed toward moving Prospero from a prototype 
to a production system. This included new database format, revision to the application programming 
interfaces to provide a more consistent interface for files and directory objects, and adoption of a 
uniform method for configuring clients. 

Steven Augart implemented a new database/directory format for Prospero that combines attribute 
information previously associated with a file or object, with the information associated with a 
directory. Sung-Wook Ryu developed a database module that allows information in the format just 
described to be stored in a common dbm database. The first change reduces the storage and i-nodes 
required to maintain information on a Prospero server and improves performance. The dbm exten- 
sion further reduces storage requirements, but locking and reliability issues make it suitable for 
only certain applications. 

Sio-Man Cheang developed a configuration package for Prospero that provides functionality similar 
to that provided by the X Window system. All user runnable commands call the configuration 
package to determine configuration parameters for network communication, gateways, debugging, 
priorities, and security and payment options. The configuration package determines the configured 
values by reading, in order, command line options, user-specific and system-specific configuration 
files, and compile time definitions. 

Electronic Commerce 

As part of an AASERT award attached to the DI VIRS contract, and in conjunction with our efforts 
on a separate contract for Security Infrastructure, Gennady (Ari) Medvinsky has been working on 
electronic payment mechanisms for the Internet. As part of this effort, a prototype implementation 
of NetCheque has been released on the Internet. Users registered with NetCheque accounting server 
are able to write checks to other users. When deposited, the check authorizes the transfer of account 
balances from the account against which the check was drawn to the account to which the check 
was deposited. Work is presently underway to integrate NetCheque with Prospero, so that payment 
for information services can be processed automatically as information is retrieved. 

Publications 

The following paper appeared during the reporting period. 

B. Clifford Neuman and Santosh Rao. The Prospero Resource Manager A scalable framework for 
processor allocation in distributed systems. Concurrency: Practice and Experience. Summer 1994 
(copy attached). 
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APPENDIX A - GLOSSARY 


AASERT Augmentation Award for Science and Research Training 

ARPA Advanced Research Projects Agency 

DIVIRS Distributed Virtual Systems 

GDB Gnu Debugger 

ISI Information Sciences Institute 

OCSG Open Computing Security Group 

PRM Prospero Resource Manager 

PVM Parallel Virtual Machine 

USC University of Southern California 
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